-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE][GLUTEN-3936] Support collapse project transformer #3937
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
5b46463
to
5cd71f8
Compare
Run Gluten Clickhouse CI |
5cd71f8
to
1d9199f
Compare
Run Gluten Clickhouse CI |
1d9199f
to
d75567e
Compare
Run Gluten Clickhouse CI |
@ulysses-you Could you help review? |
p2.projectList, | ||
alwaysInline = false) => | ||
p2.copy(projectList = | ||
CollapseProjectShim.buildCleanedProjectList(p1.projectList, p2.projectList)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be conservative, better to check if the collapsed project can be transformed, and fallback to original plan if fail to validate. Also, do not forget to copy tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not forget to copy tags.
Do you mean that after executing val newProject = p2.copy
, I still need to call newProject.copyTagsFrom(p1)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems there is no way to show the fallback reason for a transform node.. so it's ok to skip copying tags.
|LIMIT | ||
| 100; | ||
|""".stripMargin | ||
withSQLConf(GlutenConfig.ENABLE_COLUMNAR_PROJECT_COLLAPSE.key -> "false") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about use a loop to combine two test cases Seq(true, false).foreach { collapsed => }
.internal() | ||
.doc("Combines two columnar project operators into one and perform alias substitution") | ||
.booleanConf | ||
.createWithDefault(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should enable it by default and pass CI to see if there are some bad case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally it was enabled by default, but the unit test for ck failed because some tests in ck directly used the index to retrieve the plan. After my optimization, the number of plans decreased, causing an IndexOutOfBoundsException in the unit test for ck. If it's just for validation, it can be enabled by default and then disabled later.
00:39:58 16:39:58.131 WARN io.glutenproject.metrics.MetricsUtil: Updating native metrics failed due to the wrong size of metrics data: 7
00:39:58 - Check TPCH Q2 metrics updater *** FAILED ***
00:39:58 java.lang.IndexOutOfBoundsException: 10
00:39:58 at scala.collection.mutable.ResizableArray.apply(ResizableArray.scala:46)
00:39:58 at scala.collection.mutable.ResizableArray.apply$(ResizableArray.scala:45)
00:39:58 at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:49)
00:39:58 at io.glutenproject.execution.metrics.GlutenClickHouseTPCHMetricsSuite.$anonfun$new$9(GlutenClickHouseTPCHMetricsSuite.scala:210)
00:39:58 at io.glutenproject.execution.metrics.GlutenClickHouseMetricsUTUtils$.executeMetricsUpdater(GlutenClickHouseMetricsUTUtils.scala:117)
00:39:58 at io.glutenproject.execution.metrics.GlutenClickHouseTPCHMetricsSuite.$anonfun$new$7(GlutenClickHouseTPCHMetricsSuite.scala:205)
00:39:58 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
00:39:58 at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
00:39:58 at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
00:39:58 at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just fix it, e.g., change 10 to 9 if we have collapsed two projects into one. I think the code change should be test only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to put it in spark-ut module as it affects both velox and ch backend.
thank you @liujiayi771 , the idea looks good to me |
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
ac9f283
to
5ecd2f5
Compare
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
5e41c3a
to
b5c01cb
Compare
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
3c64749
to
f9d90a9
Compare
Run Gluten Clickhouse CI |
f9d90a9
to
6ac7423
Compare
Run Gluten Clickhouse CI |
6ac7423
to
895bf89
Compare
Run Gluten Clickhouse CI |
895bf89
to
03f5f4b
Compare
Run Gluten Clickhouse CI |
03f5f4b
to
db9a5ab
Compare
Run Gluten Clickhouse CI |
db9a5ab
to
1f0e0a0
Compare
Run Gluten Clickhouse CI |
1f0e0a0
to
68055d2
Compare
Run Gluten Clickhouse CI |
68055d2
to
1ab6946
Compare
Run Gluten Clickhouse CI |
@ulysses-you Ready for review. CK's UT needs to be adapted when the optimization is enabled by default, but this optimization will not be enabled by default later on. Unlike Velox, CK does not insert a project to calculate hash value before shuffle, so there are very few cases that meet the criteria for project collapsing. Currently, all UTs can pass. After completing the review, I will turn off the parameter by default and roll back the modifications made to CK's UT. |
I think enabled it by default is fine even if the clickhouse backend has less case can be optimized. |
/Benchmark Velox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, cc @zzcclp if you have other comments
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
What changes were proposed in this pull request?
Fix #3936
How was this patch tested?
Add CollapseProjectExecTransformerSuite